---
author: Marcus Rohrmoser
categories:
- en
- development
date: "2010-05-21T20:10:25+00:00"
tags:
- Atom
- expat
- libxml2
- regular expression
- RELAX NG
- RelaxNG
- rest
- restful
- schema
- trang
- W3C
- XML
- xmllint
title: 'XML Toolbox: RELAX NG & trang'
type: post
url: /2010/05/xml-toolbox-relax-ng-trang/
yourls_shorturl:
- http://s.mro.name/2g
---
e.g. when handling [RESTful APIs][1] you may want to [validate][2] the response [XML][3] – a custom
one in most cases.

I typically use tools already installed on every Mac and fire a [http GET][4] request with
[`curl`](http://curl.haxx.se/) and immediately check it with
[`xmllint`](http://xmlsoft.org/xmllint.html) like

<pre class="line-numbers"><code class="language-shell-session">$ curl http://www.heise.de/newsticker/heise-atom.xml | xmllint --format --schema myschema.xsd -
</code></pre>

But I just don't like to create and edit [W3C XML Schemas][5] – the notorious angle brackets hurt my
eyes and the redundant element names hide the real stuff in tons of ever same text. Neither do I
like to click through graphical schema editors and getting lost hunting for hidden settings and
property dialogs.

A minimal and naive schema validating the above example [Atom][6] feed (and simply created from the
feed itself with trang, see below) as W3C Schema looks like this:

{{< figure  src="/wp-content/uploads/2010/05/Bildschirmfoto-2010-05-21-um-21.05.49.png" caption="Naive Atom W3C Schema"  width="300"  height="294" >}}

Here comes in [RELAX NG][7], especially it's "[compact form][8]", which is just what I
like – a concise, [BNF-ish][9] syntax. It was designed by [Murata Makoto][10] and [James Clark][11],
Technical Lead of the XML Working Group back when XML was created and father of the famous [expat
parser][12].

The very same schema as above as RELAX NG boils down to ½ the lines and about ⅓ of the characters
without a single angle bracket:

<pre class="line-numbers"><code class="language-relaxng">default namespace = "http://www.w3.org/2005/Atom"

start =
  element feed {
    title,
    element subtitle { text },
    link+,
    updated,
    element author {
      element name { text }
    },
    id,
    element entry { title, link, id, updated }+
  }
title = element title { text }
link =
  element link {
    attribute href { xsd:anyURI },
    attribute rel { xsd:NCName }?
  }
updated = element updated { xsd:dateTime }
id = element id { xsd:anyURI }
</code></pre>

And as [libxml2][13] and therefore xmllint supports RELAX NG, you can use the regular syntax to validate like in the beginning, but with a much more editable schema:

<pre class="line-numbers"><code class="language-shell-session">$ curl http://www.heise.de/newsticker/heise-atom.xml | xmllint --format --relaxng myschema.rng -
</code></pre>

### [trang][14]

is a schema converter for RELAX NG written in Java which I wrapped inside a [bash][15] script:

<pre class="line-numbers"><code class="language-sh">#!/bin/sh
java -jar `dirname $0`/trang-20090818/trang.jar $@
</code></pre>

Writing a new schema from scratch can be much more convenient if you have a bunch of XML files you can feed into trang:

<pre class="line-numbers"><code class="language-shell-session">$ trang *.xml myschema.rnc
</code></pre>

then refine the resulting schema in compact form and finally turn it into the regular form:

<pre class="line-numbers"><code class="language-shell-session">$ trang myschema.rnc myschema.rng
</code></pre>

Trang also serves me as a schema indenter by converting from compact to regular and back.

**BUT: trang converts RELAX NG into W3C but not vice versa.**

### Deep validation

Validating XML documents shouldn't stop with elements and attributes but rather leverage [XML Schema Datatypes][16] and apply e.g. [regular expressions][17]

<pre class="line-numbers"><code class="language-relaxng">element uuid {
    xsd:string {

      ## A UUID
      pattern =
        "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"
    }
  }
</code></pre>

or range constraints

<pre class="line-numbers"><code class="language-relaxng">element year {
  xsd:unsignedShort { minInclusive = "1900" maxInclusive = "2100" }
}
</code></pre>

P.S.: For a more complete Atom RELAX NG schema see [here][18] or ask your search engine of choice.

 [1]: http://en.wikipedia.org/wiki/Representational_State_Transfer
 [2]: http://www.w3.org/TR/REC-xml/#dt-valid
 [3]: http://en.wikipedia.org/wiki/XML
 [4]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.3
 [5]: http://en.wikipedia.org/wiki/XML_Schema_(W3C)
 [6]: http://www.ietf.org/rfc/rfc4287.txt
 [7]: http://en.wikipedia.org/wiki/RELAX_NG
 [8]: http://relaxng.org/compact-tutorial-20030326.html
 [9]: http://en.wikipedia.org/wiki/Ebnf
 [10]: http://en.wikipedia.org/wiki/Murata_Makoto
 [11]: http://www.jclark.com/
 [12]: http://en.wikipedia.org/wiki/Expat_(XML)
 [13]: http://xmlsoft.org/
 [14]: http://www.thaiopensource.com/relaxng/trang.html
 [15]: http://en.wikipedia.org/wiki/Bash
 [16]: http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
 [17]: http://en.wikipedia.org/wiki/Regular_expression
 [18]: http://www.asahi-net.or.jp/~eb2m-mrt/atomextensions/atom.rnc
